Adapter Generation for Extracting and Querying Data from Web

نویسندگان

  • Kai-Uwe Sattler
  • Michael Höding
چکیده

Accessing and integrating data from heterogeneous sources has become a significant challenge. So-called adapters provide the functionality for translating SQL queries into queries understandable by the source as well as converting the results into a common model. In this paper, we present our approach of an adapter for Web sources, which is configurable by specifying a sourcespecific extraction function. We focus on two main tasks: query modification in order to extend the source capabilities and data extraction. The extraction step bases on an operational description, that enables an interactive exploration of the result format during the development phase. Finally, we present our ideas for semi-automatic discovery of extraction patterns by analyzing example documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adapter Generation for Extracting and Querying Data from Web Sources

Accessing and integrating data from heterogeneous sources has become a significant challenge. So-called adapters provide the functionality for translating SQL queries into queries understandable by the source as well as converting the results into a common model. In this paper, we present our approach of an adapter for Web sources, which is configurable by specifying a sourcespecific extraction...

متن کامل

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...

متن کامل

An XML-enabled data extraction toolkit for web sources

The amount of useful semi-structured data on the web continues to grow at a stunning pace. Often interesting web data are not in database systems but in HTML pages, XML pages, or text files. Data in these formats are not directly usable by standard SQL-like query processing engines that support sophisticated querying and reporting beyond keyword-based retrieval. Hence, the web users or applicat...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

A Framework for Improved Access to Museum Databases in the Semantic Web

Digital museum databases have extremely heterogeneous data structures which require advanced mapping and vocabulary integration for them to benefit from the interoperability enabled by semantic technologies. In addition to establishing ways of extracting and manipulating digitally encoded cultural material, there exists a need to make this material available and accessible to human users in dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999